Data-Efficient Policy Search using PILCO and Directed-Exploration

نویسندگان

  • Rowan McAllister
  • Mark van der Wilk
  • Carl Edward Rasmussen
چکیده

Reinforcement learning (RL) algorithms solve general sequential decision making problems through learning by trial and error. Many reinforcement learning algorithms are proven to find a good or optimal controller, but may take many interactions with the environment to do so. For real world tasks, this is often impractical, as letting a learner interact with the environment takes time and can be costly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch ...

متن کامل

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distr...

متن کامل

Probabilistic Inference for Fast Learning in Control

How can we learn control tasks as fast as possible given knowledge from experience only? •autonomous learning in control from scratch using experience only (no demonstrations) •no task-specific prior assumptions • learn fast (data efficient) model-based RL •deal with model bias during long-term planning: only small data sets available for learning dynamics models 1 Key Idea and Algorithm • lear...

متن کامل

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

We present a data-efficient reinforcement learning method for continuous stateaction systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning i...

متن کامل

Safe Policy Search with Gaussian Process Models

We propose a method to optimise the parameters of a policy which will be used to safely perform a given task in a data-efficient manner. We train a Gaussian process model to capture the system dynamics, based on the PILCO framework. Our model has useful analytic properties, which allow closed form computation of error gradients and estimating the probability of violating given state space const...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016